AITopics | exploitation epoch

Collaborating Authors

exploitation epoch

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning

Gupta, Piyush, Srivastava, Vaibhav

arXiv.org Artificial IntelligenceDec-19-2022

We propose Deterministic Sequencing of Exploration and Exploitation (DSEE) algorithm with interleaving exploration and exploitation epochs for model-based RL problems that aim to simultaneously learn the system model, i.e., a Markov decision process (MDP), and the associated optimal policy. During exploration, DSEE explores the environment and updates the estimates for expected reward and transition probabilities. During exploitation, the latest estimates of the expected reward and transition probabilities are used to obtain a robust policy with high probability. We design the lengths of the exploration and exploitation epochs such that the cumulative regret grows as a sub-linear function of time.

data mining, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2209.05408

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Learning in Restless Multi-Armed Bandits via Adaptive Arm Sequencing Rules

Gafni, Tomer, Cohen, Kobi

arXiv.org Machine LearningJun-19-2019

We consider a class of restless multi-armed bandit (RMAB) problems with unknown arm dynamics. At each time, a player chooses an arm out of N arms to play, referred to as an active arm, and receives a random reward from a finite set of reward states. The reward state of the active arm transits according to an unknown Markovian dynamics. The reward state of passive arms (which are not chosen to play at time t) evolves according to an arbitrary unknown random process. The objective is an arm-selection policy that minimizes the regret, defined as the reward loss with respect to a player that always plays the most rewarding arm. This class of RMAB problems has been studied recently in the context of communication networks and financial investment applications. We develop a strategy that selects arms to be played in a consecutive manner, dubbed Adaptive Sequencing Rules (ASR) algorithm. The sequencing rules for selecting arms under the ASR algorithm are adaptively updated and controlled by the current sample reward means. By designing judiciously the adaptive sequencing rules, we show that the ASR algorithm achieves a logarithmic regret order with time, and a finite-sample bound on the regret is established. Although existing methods have shown a logarithmic regret order with time in this RMAB setting, the theoretical analysis shows a significant improvement in the regret scaling with respect to the system parameters under ASR. Extensive simulation results support the theoretical study and demonstrate strong performance of the algorithm as compared to existing methods.

algorithm, epoch, exploitation epoch, (15 more...)

arXiv.org Machine Learning

1906.0812

Country: Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Data Science > Data Mining > Big Data (0.85)
Information Technology > Communications > Networks (0.66)

Add feedback

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

Wei, Lai, Srivastava, Vaibhav

arXiv.org Machine LearningFeb-22-2018

We study the non-stationary stochastic multiarmed bandit (MAB) problem and propose two generic algorithms, namely, the limited memory deterministic sequencing of exploration and exploitation (LM-DSEE) and the Sliding-Window Upper Confidence Bound# (SW-UCB#). We rigorously analyze these algorithms in abruptly-changing and slowly-varying environments and characterize their performance. We show that the expected cumulative regret for these algorithms under either of the environments is upper bounded by sublinear functions of time, i.e., the time average of the regret asymptotically converges to zero. We complement our analytic results with numerical illustrations.

algorithm, health & medicine, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1802.0838

Country: North America > United States > Michigan > Ingham County (0.14)

Genre: Research Report (0.40)

Industry:

Energy > Oil & Gas > Upstream (0.49)
Health & Medicine (0.40)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.65)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback